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Abstract 

This paper explores a co-evolutionary ap- 
proach applicable to difficult problems with 
limited failure/success performance feedback. 

Like familiar “predator-prey” frameworks 
this algorithm evolves two populations of in- 
dividuals - the solutions (predators) and the 
problems (prey). The approach extends pre- 
vious work by rewarding only the problems 
that match their difficulty to the level of so- 
lution competence. In complex problem do- 
mains with limited feedback, this “tract abil- 
ity constraint” helps provide an adaptive fit- 
ness gradient that effectively differentiates 
the candidate solutions. The algorithm gen- 
erates selective pressure toward the evolu- 
tion of increasingly competent solutions by 
rewarding solution generality and uniqueness 
and problem tractability and difficulty. Rel- 
ative (inverse-fitness) and absolute (static 
objective function) approaches to evaluating 
problem difficulty are explored and discussed. 

On a simple control task, this co-evolutionary 
algorithm was found to have significant ad- 
vantages over a genetic algorithm with either 
a static fitness function or a fitness function 
that changes on a hand-tuned schedule. 

1 Theoretical Background 

Traditional evolutionary algorithms evaluate the fit- 
ness of an individual by evaluating its ability to mini- 
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mize an objective function which is typically static and 
independent of the evolutionary algorithm. For exam- 
ple, if the goal is to evolve a posture controller for 
a robot, the fitness of an individual controller could 
be its success in minimizing movement in the robot 
body under a gravity load. In co-evolutionary algo- 
rithms, the fitness of an individual in the evolving 
population(s) depends on interactions with other in- 
dividuals in the same generation. The problems (e.g. 
the forces on the robot) faced by individuals in a co- 
evolutionary algorithm are dynamic and are shaped by 
the algorithm itself. Extending the robot example, the 
situations that a robot faces (e.g. forces on the robot 
like gravity) could be co-evolving with the controllers 
such that the set of situations on which controllers are 
evaluated changes from generation to generation. 

1,1 Co-evolution: Competition and 
Cooperation 

A growing body of research explores co-evolutionary 
approaches that capitalize on this dynamic quality (for 
review, see Paredis, 1998) . This co-evolutionary work 
has largely concentrated on competitive interactions. 
The interactions can be between individuals that com- 
pete in a symmetric game-like context (Pollack et al . , 
1996; Sims, 1994; Rosin, 1997) or between popula- 
tions of different types of individuals that compete in 
predator /prey type relationships (Hillis, 1991; Paredis, 
1994b; Paredis, 1994a; Cliff k Miller, 1996; Juille & 
Pollack, 1998; Rosin, 1997; Rosin & Belew, 1996). In 
these cases, individuals are rewarded if they defeat the 
individuals with which they compete. These interac- 
tions can support “arms-races” in which the individu- 


als force each other to become increasingly competent. 

A few studies have investigated the role of coopera- 
tion and how it can help solve some problems endemic 
to evolutionary methods, like the difficulty of choos- 
ing an appropriate encoding for the individuals (Pare- 
dis, 1995) and the difficulty of decomposing composite 
problems (dong & Potter, 1995). Other studies have 
found that a balance of cooperation and competition 
is necessary to prevent evolutionary algorithms from 
getting trapped in local minima, or “Mediocre Stable 
States” (Ficici, 1995). 

1.2 The Current Approach 

The approach outlined in this paper has features of 
both competitive and cooperative co-evolutionary ap- 
proaches. The algorithm tries to ensure a tractable 
learning gradient for the solutions by rewarding only 
those problems on which at least one solution was suc- 
cessful. The fitness of these tractable problems is pro- 
portional to their absolute and/or relative difficulty 
providing pressure for the solutions to become more 
generally competent. In practice, this requirement 
generates an initial simplification and gradual increase 
in problem difficult}" over evolution. The aim is to se- 
lect for problems that are on the edge of what is solv- 
able by the current population of solutions, ensuring a 
useful fitness gradient throughout evolution. 

This requirement that the problem must be tractable 
has been relatively unstressed in the literature, with 
a couple notable exceptions. Rosin (1997) suggests 
a mechanism (the “Phantom Parasite”) that rewards 
problems that are solvable by at least one solution. 
This mechanism will tend to allow easy problems to 
survive in a population of very difficult problems. 
Juille and Pollack (1998) use a domain specific ap- 
proach to selecting for problem tractability by reward- 
ing problems that, tend to be easier by an objective 
measure. 

Dealing with problem tractability is not an issue in 
problem domains where the problems provide partial 
fitness measures (Hillis, 1991; Ficici, 1995) or have a 
baseline success rate that is fairly high, akin to a mul- 
tiple choice problem (Juille & Pollack, 1998; Paredis, 
1994b). In these cases, there is always a fitness gradi- 
ent for the solutions to follow in the form of the number 
of problems solved. However, in many real problem 
domains the performance of a set of randomly chosen 
solutions on a randomly chosen problem would be so 
low that an observer or fitness function would be un- 
able to differentiate between the performance of the 
candidate solutions. 


1.3 Difficult Tasks 

Many problems require a surprisingly high level of ex- 
pertise to even be approached. Faced with such a prob- 
lem a naive learner must be given some bias, or a struc- 
tured learning environment (termed a “gradient engi- 
neered fitness landscape” by Ficci and Pollack, 1998 ) 
to have a hope of mastering the task. In developmen- 
tal terms, the current task must be kept in the “Zone 
of Proximal Development” (Vygotsky, 1986), or ZPD, 
in order to be tractable and useful to the learner. If 
the problem is outside the ZPD, then the learner will 
be unable to gain competence through experience with 
the task. In evolutionary terms, a fitness function that 
is too far beyond the competency of the individuals will 
fail to usefully differentiate between the individuals, 
and evolution will be unable to select for competency. 

The challenge of staying within the ZPD is especially 
relevant in difficult reinforcement learning problems. 
In these problems: there is an absolute measure of 
performance (as opposed to a game with relative per- 
formance), the measure of performance is mainly lim- 
ited to success/failure, and the baseline probability of 
success for a solution given a typical problem is very 
low. For example, the control or design of a complex 
structure like an automobile engine depends on many 
pieces coming together in just the right way before any 
success at all is achieved. This seemingly impossible 
design task has only been tractable because the task 
itself has evolved over history. Originally the task was 
simply to translate heat into rotational energy. Details 
that are crucial to current engines like gearing, inter- 
nal combustion, carbeuration, etc. were only added 
as each progressively more complex design was real- 
ized. In this paper we explore some mechanisms that 
could help make complex problems tractable to evolu- 
tionary algorithms by providing a gradient of problem 
difficulty /complexity over evolution. 

2 Problem/Control Framework: 2D 
Free-Space Vehicle 

This work uses a relatively simple simulation frame- 
work that allows for quick exploration of co- 
evolutionary mechanisms. The problem is to control 
the thrusters on a craft floating in free space such 
that the craft goes to a given point (the ”origin”) and 
conies to rest within a given period of time (1 sec, 5 
time-steps). The movement of the craft is limited to 
2 dimensions, and is simulated approximately using 
discreet time-steps. At the end of the time period, a 
solution ” succeeds” if the craft is resting (within some 
error) at the origin at the end of the time period — 



otherwise it ” fails 1 '. This method of evaluation con- 
verts the available continuous error signals to a rein- 
forcement learning signal. 

Problem difficulty can be easily parameterized in this 
framework. An optimal solution in this framework 
would be able to steer the craft toward the origin from 
any position and initial velocity and would learn to 
stop at the origin within the time period. Because so- 
lution performance is evaluated over a limited period 
of time, a large initial distance and velocity require the 
solution to generate strong and accurate thruster fir- 
ing. In contrast, small initial distances and velocities 
can be successfully navigated with weak and relatively 
inaccurate thruster firing (see Sec. 5.1 for limitations 
of this interpretation). In general, problem difficulty 
is proportional to the craft’s initial distance {D p ) from 
the origin and initial velocity ( D v ). 

Candidate solutions in this simulation are simple linear 
networks where the change in the XY thrust at each 
the next time step is a weighted sum of the current XY 
thrust, velocity, and position. 1 A candidate solution 
is a set of weights for this network. 

3 Evolutionary Framework: 
Co-evolutionary GA 

Each problem is described by 2 scalars (see Fig. 1): 
initial distance from the origin ( D p ), and initial ve- 
locity ( D v ). The actual position and velocity of each 
problem in each generation is chosen randomly from 
the points on the circles described by the two problem 
scalars, thus at each generation a problem describes 
an initial XY position and velocity. In this way the 
difficulty of the problems (the magnitude of the prob- 
lem scalars) can be preserved or changed from gen- 
eration to generation, while the specific problems are 
randomly sampled each generation. 

Each generation every solution is evaluated on every 
problem. The weights of the solutions/networks are 
evolved using a genetic algorithm. 2 In simulations, 
an initial population (N = 50) of solutions is cho- 
sen at random with relatively small weights ([-.05, 
.05]). These solutions are then evaluated on the set 
of problems (N = 50) present that generation. The 

1 Some simulations were run using a feed-forward neu- 
ral network architecture (2 Layer, 2 hidden units with a 
hyperbolic transfer function). These simulations yielded 
qualitatively similar results. 

2 Although in this case the control networks could be 
trained using back prop ogat ion or a similar neural net- 
work training algorithm, a genetic algorithm was used 
to find effective weights so as to explore co-evolutionary 
mechanisms. 



Figure 1: The figure represents the possible initialization 
conditions represented by a given problem ( D p , D v ). The 
square at the center of the large circle is the origin. D p 
is the difficulty/distance of the initial position, D v is the 
difficulty /magnitude of the initial velocity. The actual ini- 
tial configuration, a starting position in the case of D p (the 
star) and a velocity vector in the case of D x , , is chosen ran- 
domly from the set of points on the circles. The dotted 
circles represent other possible starting positions (dotted 
stars) with their associated sets of possible velocity vec- 
tors. 

genomes of the solutions are lists of the 12 floating 
point weights in the network. Each generation new 
candidate solutions are generated by probabilistically 
choosing parents (based on their sigma-scaled fitness, 
Mitchell, 1996), re-combining them in pairs via 2 point 
cross-over, and with some probability (%10 mutation 
rate) mutating each weight of the new solutions by 
adding a random number (selected from [-1,1]). The 
best 5% of the solutions at each generation are repli- 
cated exactly in the following generation. 

The focus of this paper is on different methods of 
choosing the evaluation problems (initial conditions). 
This work compares three methods of generating prob- 
lem difficulties (D p s and D v '&) for the sample prob- 
lems at each generation: the standard evolutionary ap- 
proach, the gradient ./developmental approach, and the 
co-evolutionary approach (with 2 particular instantia- 
tions). Note that in all methods the specific problems 
(starting position and velocity) were randomly gener- 
ated by selecting the starting point and velocity vector 
from the circles described by D p and D v . Even if the 
problem difficulties were identical across generations, 
the specific problems would be different. 

3.1 Standard Evolutionary Approach 

The first method is to randomly select each ( D p , D v ) 
from a uniform distribution across [0,Z) m ], where D m 
is the maximum problem difficulty (typically 50). In 
this method the average problem difficulty is constant 



at Qg- (see Fig. 2, heavy line). This first method is 
meant to reflect the most common /standard practice 
where throughout evolution the solutions are evalu- 
ated on the full set of possible problems, or a fully 
complex target problem. 

3.2 Gradient/Developmental Approach 

The second method is inspired by the developmental 
considerations discussed above. This method presents 
an increasingly difficult set of problems to the popula- 
tion of solutions. The difficulties (£> p , D v ) at each gen- 
eration are chosen from a uniform distribution across 
[0,Z? m x G(t)], where G(t) is a monotonically increas- 
ing function of generation number (t). Typically G{t) 
is a simple linear increase from 0 at generation 1 to 
Dm at the last generation (see Fig. 2, medium lines). 
In this method the average problem difficulty increases 
monotonically over training. This second method re- 
flects the developmental theory (Elman, 1991; New- 
port, 1988) and intuitive heuristic, that hard problems 
are easier to learn if problem complexity starts off low 
and increases gradually over training as the compe- 
tency of the solution improves. 

3.3 Co-evolutionary Approach 

The third method co-evolves the difficulties of the eval- 
uation problems and the the weights of the candidate 
solutions. Like the solutions, problem difficulties are 
evolved with selection, cross-over and mutation. The 
average problem difficulty is under the control of the 
evolutionary algorithm in this method. A central fo- 
cus of this paper is to determine if this co-evolutionary 
algorithm can discover and optimize the hand-coded 
gradient /developmental method described in the pre- 
vious section (for actual behavior see Fig. 2, fine lines). 

We explored two methods of evaluating the raw fit- 
nesses of the solutions and problems: absolute and 
relative. In the absolute method, the raw fitness of 
a solution (Fp) is the sum of the difficulties of the 
problems that it completed successfully, where N is 
the number of problems, and Sj is 1 if problem i is 
successfully solved by solution j and 0 otherwise (see 
Eq. 1). 


The relative method is similar to the “inverse-fitness”, 
or “competitive fitness sharing” method used by previ- 
ous researchers (Paredis, 1998; Juille k Pollack. 1998; 
Rosin, 1997; Rosin k Belew, 1996). Fitness of the 
solutions is proportional to the number of problems 
that they successfully solve, with the reward for each 
problem being inversely proportional to the number of 
solutions that solved it (see Eq. 2) — a rough measure 
of how “easy” it is. 

N ci 
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The fitness of each problem is inversely proportional 
to the number of solutions that complete it success- 
fully, with a tractability constraint. A problem not 
successfully completed by any solution gets zero fit- 
ness (instead of the maximum fitness in the traditional 
“inverse- fitness” approach) . 


Fsi = 


5£i Sj 


(3) 


Here T l (the tractability of problem i) is 1 if any of 
the solutions successfully completed problem i and 0 
otherwise. 


4 Results 

Two measures are displayed for each of the 3 evolu- 
tionary methods. Displayed results are the average 
of 10 runs in each method, with the same parameters 
in all runs. 3 The first reports the mean difficulty of 
the problems ^,=o ( D l + D p))- Tlie second is a 
measure of the performance of the most fit solution. 
In order to get a standardized measure of the solu- 
tion performance, the solution with the highest fitness 
in each generation was evaluated on a standard set of 
625 initial conditions selected so as to sample a regular 
grid of initial positions and velocities. 


Pj = 100 x (1 - 
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(4) 


FPj = l't(D i v + D i p )xS i j (1) 

i— 1 

The absolute raw fitness of a problem is its diffi- 
culty (\(D V + D p )) if it was completed successfully 
by at least one solution and 0 otherwise, satisfying the 
tractability constraint. 


The performance of the highest fitness network (Pj) 
was evaluated by summing the errors in the final po- 
sition (D p j) and velocity {D vj ) reached from the test 

3 Simulation Parameters: 50 problems, 50 solutions, 2 
seconds of controller time, time step of .2 sec, 250 genera- 
tions total, linear solution networks, .05 elitism, .1 muta- 
tion rate, and mutation step size is randomly drawn from 
[- 1 , 1 ] 



set of initial conditions (indexed by i, T total) with 
thrusters controlled by the highest fitness network 
(network j). This sum was then compared to the 
final errors in position and velocity reached with no 
thrusters firing D po and D v a, and the proportion was 
normalized such that perfect performance would cor- 
respond to a performance score of 100 (see Eq. 4). It 
should be noted that the performance score is negative 
if the given solution is worse (i.e. results in larger D p / s 
and D vj ' s) than the 0 thrust case. In fact, a nega- 
tive performance score is overwhelmingly likely given a 
randomly generated solution (only 11/1000 randomly 
generated solutions had a positive performance score, 
and the average performance was -5000). 



Figure 2: Each line shows the problem difficulty at each 
generation for a given approach. Each line is averaged over 
ail the problems in that generation and over 10 runs. 50 is 
the maximum initial difficulty in the co-evolutionary runs, 
final difficulty in the gradient runs, and maximum diffi- 
culty throughout the standard run. The heavy line is the 
standard approach. The fine lines are co-evolutionary ap- 
proaches with problem difficulty evaluated absolutely (dot- 
ted) and relatively (solid). The medium lines are hand- 
tuned gradient approaches with a fast rise in task difficulty 
(dotted) and a slow rise in task difficulty (solid). See text 
for details. 

4.1 The Standard Approach 

In some parameter regimes, the standard case (select- 
ing D p and D v from a uniform distribution across 
[0,.D m ] throughout evolution) generally failed to find 
a generally successful solution (See Fig. 3, heavy line) 
over the course of evolution. The negative performance 
of the solutions is probably due to “fortuitous” initial- 
ization/solution matches, where the solution is unable 
to generalize its successful performance to the test set 
of initializations/problems. For example, a solution 
that continually fires the left thruster might be suc- 
cessful in a generation where one of the initial positions 



Figure 3: Each line shows the performance of the best so- 
lution at each generation for a given approach. Each line 
is the average of 10 runs, and 100 is the maximum per- 
formance. The solid heavy line is the standard approach. 
The fine lines are co-evolutionary approaches with prob- 
lem difficulty evaluated absolutely (dotted) and relatively 
(solid). The medium lines are hand-tuned gradient ap- 
proaches with a fast rise in task difficulty (dotted) and a 
slow rise in task difficulty (solid) see Fig. 2. The main 
parameters were held constant in all runs. See text for 
details. 

is just off to the right, but it (and its offspring) will be 
unable to generalize that success to another random 
sample of problems. In these 10 runs the standard al- 
gorithm came up with a relatively poor solution with 
an average performance of 15. 

4.2 The Gradient/Developmental Approach 

The evolution of competent solutions is made much 
more robust by gradually increasing the average prob- 
lem difficulty over evolution (see Fig. 3, solid medium 
line). The general success of this approach is due to 
the fact that it can ensure that the problems are al- 
ways simple enough for some of the solutions to solve, 
enabling evolution to get a foothold in differentiating 
solution fitness based on performance. Only solutions 
that have been selected for many generations face dif- 
ficult problems late in a given evolutionary run. 

This approach has the shortcoming that the rate of 
problem difficulty increase must be tuned to the given 
problem and rate of competency growth in the solu- 
tions. If the difficulty of the problems is increased 
too quickly, then the success of some solutions is not 
overwhelmingly likely and, as in the standard case, 
the run may fail to find a generally successful solution 
(see Fig. 3, dotted medium line). In the case of a too- 
steep gradient, the gradient approach yielded a final 
controller with a fitness of only 25. Generally speak- 



ing, if the difficulty of the problem is increased too 
slowly, then little evolutionary pressure is put on the 
solutions to have general competency and suboptimal 
solutions will result (but see Sec. 5.1 for discussion of 
this problem as an exception). 


4.3 The Co-Evolutionary Approach 

The co-evolutionary method retains advantages of the 
LIM approach, but avoids the necessity of selecting the 
schedule of increasing problem difficulty at an arbi- 
trary rate (See Fig. 3, fine lines). The co-evolutionary 
approach has the advantage of automatically adjusting 
problem difficulty to match solution competence (see 
Fig. 2). Even though the average problem difficulty 
starts off large, easy problems have much higher fitness 
early in evolution because they are the only problems 
that can be successfully completed by relatively incom- 
petent solutions. Easy problems tend to take over the 
population of problems just after a tractable problem 
is found, (see Fig. 4) while the solutions are still rela- 
tively incompetent . Problems tend to get harder over 
evolution because they are rewarded for being solvable 
only by a few solutions (relative) or for being more 
difficult by some absolute measure (absolute). Any 
problem that increases in difficulty too quickly will be 
penalized because it will not be successfully completed 
by any of the solutions. 



Figure 4: The average difficulty of the population of prob- 
lems during a representative run (taken from the 10 aver- 
aged runs) using co-evolution with selection for absolute 
problem difficulty. Note the random search, followed by 
problem simplification and a gradual increase in problem 
difficulty. See text for details. 


5 Discussion 

This paper presents an approach to using co-evolution 
to simplify complex problems. By rewarding a co- 
evolved population of problems for being at the edge 
of what is currently solvable to the population of solu- 
tions, the method generates a usable fitness gradient 
for the solutions while encouraging general solution 
competency at difficult problems. This approach takes 
some small steps toward making co-evolutionary algo- 
rithms more applicable to a difficult and important 
class of problems. The results that were presented 
demonstrate that, in some domains, the approach can 
be more effective than a traditional evolutionary ap- 
proach and more flexible than a hand-coded approach. 

5.1 Limitations 

This work has some limitations that are important for 
proper interpretation. 

First, initial success in the co-evolutionary and stan- 
dard approaches is simply probabilistic. Even with the 
tuned parameters in the simulations reported above, 
several generations often pass without any successful 
sohit ion/problem pairings. Indeed this difficulty was 
explicitly chosen, because if the problem is made too 
simple (e.g. by increasing the error threshold for suc- 
cessful performance) than there is a sufficient fitness 
gradient for the standard approach to perform as well 
as the co-evolutionary approach. During these un- 
successful generations, the algorithm does a random 
search for solvable problems, and all but the elite so- 
lutions undergo random evolution or genetic drift. In 
a very difficult problem domain, randomly generated 
solutions will almost never successfully solve randomly 
generated tasks. This issue could be addressed by 
seeding the initial population with simple problems 
that are thought to be applicable to the fully complex 
problem, thus ensuring some success in even a random 
population of solutions. 

Second, the algorithm here involves problem simplifi- 
cation instead of problem decomposition. In the case 
of simplification it is straightforward to generate esti- 
mates of problem difficulty or problem match to a tar- 
get objective function, therefore is it easy to evaluate 
problems on their absolute difficulty. In problems that 
are compositional, hierarchical, or otherwise complex 
this assignment of absolute difficulty is not as straight- 
forward. Unfortunately, it is also hard to get a useful 
measure of the intrinsic difficulty in complex problems. 
The issue is that an evolving problem must be difficult 
in the same way as the ultimate target problem, and 
usually there are many other ways to be difficult . The 




challenge is to find a problem representation that al- 
lows simple evaluation of the similarity or applicability 
of candidate problems to the target problem. Such a 
representation allows an absolute difficulty measure to 
help guide the explorations generated by the intrinsic 
difficulty measure. 

Third, the fact that this domain provides only for 
problem simplification ensures that solutions that suc- 
ceed at simple problems will tend to succeed at hard 
problems as well. The most vivid illustration of this 
fact is that runs with the gradient/developmental ap- 
proach and with a very low maximum problem diffi- 
culty (D m ) evolve a solution with competence nearly 
matching a gradient /developmental approach with a 
relatively high D m . The result is that there is rela- 
tively little intrinsic pressure for the problems to be- 
come more difficult. This fact limits the usefulness of 
this problem domain for study of these co-evolutionary 
mechanisms. 

5.2 Future work 

We plan to test this co-evolutionary approach in prob- 
lem domains that avoid the limitations mentioned 
above. One candidate domain is co-evolving analog 
filters and their target frequency response. Previous 
work on the evolution of simple analog filters has found 
the efficiency of evolutionary search to be highly de- 
pendent on a proper choice of fitness functions (Lohn & 
Colombano, 1998). In addition, somewhat complex fil- 
ters, like passive cross-over filters are relatively difficult 
to design and optimize by hand. This well-explored do- 
main should allow us to test the ability of co-evolution 
to provide a usable gradient through simplification and 
decomposition. 

A second candidate domain co-evolving a gait con- 
troller for a walking robot. Previous work has found 
that decomposing a locomotion problem into behav- 
iors provides a many fold speed-up in controller evolu- 
tion (Gruau, 1996). The chore of deciding how' to use- 
fully decompose a robotic control task is generally not 
straightforward and thus far has depended on the in- 
sights and patience of a human programmer. We plan 
to use a co-evolutionary approach to evolve a controller 
for a semi-rigid walking robot under current develop- 
ment at NASA Ames Research Center. 
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